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Abstract 

Algorithms on grammars/transducers with context-free derivations: hypergraph reachability, shortest path, and 
inside-outside pruning of ’relatively useless’ arcs that are unused by any near-shortest paths. 


1 Introduction 

We present algorithms on context-free grammars (and also on hypergraphs and regular tree grammars, which share 
the same context-free derivation rule): hypergraph reachability, shortest path, and inside-outside pruning of ’relatively 
useless’ arcs that are unused by any near-shortest paths. Section U is optional for those already familiar with regular 
tree grammars (analogous to derivation trees of context free grammars) and/or hypergraphs. 


2 Notation 

2.1 Strings 

E* are the strings over alphabet E. For s = (si,..., s n ) the length of s is |s| = n and the ith letter is s[z] = 
Si, for all i £ indices s = {*gN|1<*< n}, and the concatenation of a sequence of letters by index is 
s[(/i,..., f n ) £ indices *] = (s[/[l]],..., s[/[n]]). Concatenation of strings is specified by the • operator, where 
a-b= (a[l],...,a[|o|],6[l],...,6[|6|]). 

2.2 Multisets 

A multiset M of S is a partial function M : S —> N, or equivalently, a functional binary relation M C S x N. The class 
of multisets of S is written At (S'). If M(s) = to £ N, we say (x, m) £ M, x £ M, and the multiplicity of x in M is 
to. Intuitively, the multiplicity is the number of times an element occurs. The domain of M is dom M = {x £ M}. In 
some cases it is convenient to interpret M as a total function from S —> (N U {0}) where M{x dom M) = 0. A set 
S can be interpreted as a multiset where each x £ S has multiplicity S(x) = 1. A sequence V = (vi,..., v n ) £ S* 
can also be seen as a multiset with V(x) = 'ff i . v . =x 1 (after all, another notation of a multiset is just a set listed 
without removal of duplicates, e.g. {a, 6, a}). 

2.3 Trees 

Ts is the set of (rooted, ordered, labeled, finite) trees over alphabet E. 

Ty,(X) are the trees over alphabet E, indexed by X —the subset of Tjxtx where only leaves may be labeled by X. 
(Te( 0) = Tx:-) Leaves are nodes with no children. 

*Work done at University of Southern California, Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, CA 90292 
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The nodes of a tree t are identified one-to-one with its paths: pathst C paths = N* = FT (-'4° = {()})■ The 
path to the root is the empty sequence (), and p\ extended by p 2 is p\ ■ P 2 , where • is concatenation. 

For p £ pathst, ranktip ) is the number of children, or rank, of the node at p in t, and labeltip) £ £ U X 
is its label. The root oft is root(t) = labelt(()). The ranked label of a node is the pair labelandranktip) = 
(labeltip), ranktip)). For 1 < i < ranktip), the i ,h child of the node at p is located at path p ■ ( i ). The subtree at 
path p oft is tip, defined by pathstip = {q \ p ■ q £ pathst} and labelandrankty p {q ) = labelandranktip ■ q). 
The children oft are childrent £ T£, with childrenfif] = t { (i), VI < i < rank(t). 

The paths to X in t are pathst{X) = {p £ pathst \ labeltip) £ X}. A frontier is a set of paths / that are 
pairwise prefix-independent: 


Vpi,p 2 £/,p€ paths :pi=P 2 -p => Pi = P 2 
A frontier of t is a frontier / C pathst- 

For t,s £Ty, (X), p £ pathst, t[p ■£- s] is the substitution of s for p in t, where the subtree at path p is replaced by 
s. For a frontier / of t, the mass substitution of X for the frontier f in t is written t [p ■£- X, Vp £ f] and is equivalent 
to substituting the X (p) for the p serially in any order. 

The yield of X in t is yieldt(X), the string formed by reading out the leaves labeled with X in left-to-right order. 
The usual case (the yield oft) is yieldt = yieldt(X), 

We may also consider the monadic strings in t, mstringst C £*, obtained by reading off the labels along some 
path from the root down. The paths that read off a monadic string s in t are mpathsf (s) = {p £ pathst | VI < i < 
|p| + 1 : labeltip f (1, i))«s[*]}, and the string of labels along a path is mstringtip £ pathsf) = (label t (p f 
(1 ,i))) (so Vp £ mpathsfis) : mstringtip)^ s). Then mstringst = {mstringtip £ pathst)} and t 4- s is the 
sequence of subtrees oft along the monadic string s (in lexicographic path order): 

t f S £ mstringst = 9 pdmpaths^ (s) in lexicographic order (f I- V ) 

Naturally, the path in t to the i ,h element of t 4- s is the i ,h (in lexicographic order) mpathstis). 

2.4 Regular Tree Grammars 

A weighted regular tree grammar (wRTG) G is a quadruple (£, N, S. P), where £ is the alphabet, N is the finite 
set of nonterminals, S £ N is the start (or initial) nonterminal, and P C N x Ts{N) x R + is the finite set of 
weighted productions (M + = {r £ R. | r > 0}). We define the binary relation =>g ( single-step derives in G ) on 
Ts{N) x ipaths x P)*, pairs of trees and derivation histories, which are logs of (location, production used): 

=>G= {((a, h), (b,h- ip, {l,r,w))) \ 
il,r,w ) £ P Ap £ paths a i{l}) A b = a[p ■£- r]| 

where (a, h) =>g (b, h ■ ip, {l, r, w))) iff tree b may be derived from tree a by using the rule l —» w r to replace the 
nonterminal leaf l at path p with r. For a derivation history h = {(pi, {h,ri,wi)),..., (p n , il\,ri,wi))), the weight 
ofh is w{h) = n'Li Wi, and call h leftmost if L{h) = VI < i < n : p,+i -flex PiU 

The reflexive, transitive closure of =>g is written (derives in G), and the restriction of =>£. to leftmost deriva¬ 

tion histories is =>g* {leftmost derives in G). 

The weight of a becoming b in G is wda, b) = Ylh-(a ())=^ L *(b h) th e sum of weights of all unique (leftmost) 
derivations transforming a to b, and the weight oft in G is Wait) = wciS, t). The weighted regular tree language 
produced by G is Lq = {(£, w) G Ts x R + | Wait) = w}. 

The derivation tree grammar for a wRTG G = (£, N, S, P) is DG(G) = (P, N, S, P'), where 

P' = {(l,p{yield N (r)),w) \ p = {,l,r,w) £ P} 

(p((si,..., s n ) £ N*) is the tree with root label p, rank n, and i’ h child leaf sf). The produced trees are called 
derivation trees and correspond one-to-one with tree-producing derivations in G. 

*() <leX (“). (dl) <lex (“2) iff a. 1 < a 2 , (ai) • 61 < iex (a 2 ) ■ b 2 iff a\ < a 2 V (ai = a 2 A b 1 < Ux b 2 ) 
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2.5 Hypergraphs 


A (directed) hypergraph G is a pair G = ( V, E) where V is a set of vertices (or nodes ) of G, and E are the edges (or 
hyperarcs) of G. An edge e = ( h e £ V,T e ,c e : Rl Te l —» K) has head h e , tails T e , and cost function c e . The cost 
function for an edge maps the costs of reaching its tails to the cost of reaching the head through that edge. 

In a hypergraph, T e C V —the tails are subsets of the vertices. 

In an ordered multi-hypergraph, T e £ V* —the tails are ordered sequences. 

Typically hyperarc cost functions are symmetric; if not, then the order of arguments is the same as the order of 
tails. , or for unordered hypergraphs, fixed by some arbitrary total order <q on V. The usual cost function is given by 
c e (xi,..., x n ) = l e + Y^i =i x i > where l e is the length of the edge. A typical asymmetric cost function would combine 
tail hyperpath costs with different weights for each tail. 

We say there is a hyperpath from X C V to y £ V in G = (V,E), written X q y, if y £ X V 3e £ E : 
h e = y A Vi £ T e : X t. A hyperpath-tree t £ (X ^q y) is a tree labeled by edges, corresponding to a proof 
of X '••'>(; y (with a separate proof for each multiple occurrence of a tail vertex - note: the usual B-hyperpath allows 
only a single incoming hyperarc/proof of each vertex - our hyperpath-trees are more like derivations in a context-free 
grammar). The cost of a hyperpath-tree p is written c(p) and is computed bottom-up for each subtree with root label e 
using c e . 

For any derivation grammar G' = {P, N, S. P') of wRTG G = (£, N, S, P), there is an equivalent ordered 
multi-hypergraph H = (TV U {w}, E) with an edge e £ E for each production p = ( l,r,w ) £ P' such that h e = l, 
{w} if yield N (r ) = 0 

yields {r) otherwise 

are exactly the derivation trees for G, with the cost of the hyperpath-tree equal to the In of the weight of the tree 
(obviously, the labels of the hyperpath-tree are e £ E and the labels of the derivation tree are p £ P, but there is an 
isomorphism between them, due to the construction of E). 

A hypergraph (V, E) may be interpreted as a multigraph (V. E') with an edge for every tail of each hyperarc 
( E' = {{h e , t £ T e , c e ) | {h e , T e , c e ) £ E}). We can refer to simple (or monadic ) paths corresponding to the usual 
paths in the graph. In fact, monadic strings s of hyperarcs from a hyperpath-tree for {V, E) correspond to a simple 
path in /i s [| a |] v,e') ^s[i]- 


T e = 


, and the usual cost function with l e = — In w. The hyperpath-trees uj • 


*H 


S 


3 Pruning Along a Hyperpath-Tree 


If we are only interested in hyperpath-trees X g !J, we can prune G along X to y by eliminating vertices and 
hyperarcs that don’t appear in any (cheap) hyperpath-tree. This is analogous to the problem of reducing a context free 
grammar by eliminating useless nonterminals (Hopcroft and Ullman,, 1979), except that we wish to also eliminate 
those useful only for high-cost hyperpath-trees. 

Since we care only for the existence of a (cheapest) path for each node, tails of edges may be considered as 
sets while addressing this problem, so that multiply appearing tails t in a multi-hypergraph always reuse the same 
hyperpath-tree X q t. We assume the cost function c e (c) = l e + Yltt m)eT w e {t)mc{f), where c{t) is the cost due 
to the hyperpath-tree X ^ t and w e {t) is a weight given to t-tails of that edge. 

Unweighted pruning consists of first eliminating vertices (and hyperarcs they occur in) that cannot be reached 
from the start, and second, eliminating from the remainder all those that do not lie along any hyperpath-tree to the 
destination. The first step can be performed in linear time by Algorithm[j] 

The weighted version of Algorithm[T|establishes the lowest cost way of reaching each vertex from a start set (or 
that there is none). Algorithm^ adapted from (Kn uth,, 1977| ) (first published in (Kni ght and Graehl,, 2005| l), is an 
extension of the graph shortest path problem (Dijkstr a,, 1959[ > to the hypergraph case. It works the same except that 
vertices are visited in increasing order of the cost of reaching them from X, and so requires a priority queue. Activated 
hyperarcs serve to potentially lower the cost of reaching their head, but visiting the head is deferred until it is certain 
that its minimal cost hyperpath-tree is known. This is in contrast to the simple depth first approach in the unweighted 
case, where the head is visited immediately with a recursive function call (using the implicit program stack for queuing 
nodes). 
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Algorithm 1: Single-source-set hypergraph reachability 

Input: 

A set of source nodes X C V in a hypergraph G = (V. E), nodes V, and hyperarcs E = {e \,..., e m } indexed 
byl < i < m. Each hyperarc has tail nodes 7j C V = T ei and head h, £ V = h ei . 

Output: 

For all y £ V, B[y] = true if X y, false otherwise. Time complexity is 0{t) where t is the total size of the 
input. 

begin 

for y £ V do B\y] £- false 

Adj[y] {} 

for 1 < i < e, index of a hyperarc (Tj = {xi, ..., Xk}) —> {hi} do 
r[i] <- k 

/* r[i] is the number of tail nodes remaining before edge i fires. */ 
for 1 < j < k do Adj[xj] £- Adj[xj] U {7} 

for y £ X do REACH(y) 

REACH(y) = begin 
if ->B[y\ then 
B\y] true 
for i £ Adj\y\ do 
if -i B[hi\ then 
r[i\ <r- r[i] - 1 
if r\i] = 0 then REACH (ft*) 


Having eliminated parts of the hypergraph that aren’t reachable from X, it still remains to further remove any parts 
that don’t contribute to reaching y. In Algorithm 0 we perform a simple depth-first traversal from heads to tails of 
hyperarcs, starting with the destination y, ultimately saving only vertices that can help reach y. 

To see how this works, let the restriction of hypergraph G = (V. E) to a subset of its vertices V C V be 
G(V') = (VE) : E' = {e £ E \ h e £ V' /\T e C V’}. First, run AlgorithmQ]on G to find V' = {v £ V' \ X ^ G }, 
then second, run Algorithm[3]on the resulting restriction G' = G(V') to find V" = {v £V' \ 3 F D {u} : F U- 

Then the hypergraph G" = G'(V") has the same hyperpath-trees X y as G, and is the minimal such. 

The order of these steps is essential - there may be vertices that only help reach y through hyperarcs that are 
eliminated in Algorithm Q] In the second step, we qualify each node t £ T e that is connected through r: to y as 
participating in a path to X G h e automatically, which is sound only if we can assume some path from A' th 
for all t' £ T e . But the first step guarantees this by removing all nodes that aren’t reachable from A'. 

What we are really doing is reversing a hypergraph by interpreting it as a monadic graph consisting of all edges 
formed by selecting just one tail of each hyperarc, and plugging in a default rule for completing the omitted siblings. 
We can extend this strategy to the weighted case, using the shortest hyperpath-tree X ^ v (m ['(;]) (from from Algo- 
rithmO for each omitted sibling v. Then we can attribute to each monadic arc the cost of those omitted hyperpath-trees 
(/3[r>]), in addition to the cost of its original hyperarc. Then we can perform the usual single-source shortest graph paths 
computation! Dijkstra,, 1959| on the this reverse monadic graph. 

Since any subtree of a shortest hyperpath-tree t £ (X ^ y) is a shortest hyperpath-tree from X to its root-head 
hiabei t ( ()), we can decompose the shortest hyperpath-tree using node v into the shortest inside X v plus the outside 
v y formed by reconstituting a path in the monadic graph with the default interpretation of omitted siblings. 
The outside part is an almost-hyperpath-tree, missing only an inside subtree for X ^ v (an outside tree would be a 
hyperpath-tree from X U{u} y). This is the insight behind the inside-outside algorithm! Fari and Young,, 1990lfor 
training context free string grammars, and also its extension to training tree transducers! Graehl and Knight,, 2004}). 

Note that this decomposition means that the cost functions for hyperarcs must be separable into an independent 
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Algorithm 2: Viterbilnside: single-source-set, multi-destination shortest hyperpath-trees. 

Input: 

A set of source nodes X C V with initial costs {% x , Vx £ A'}, and a hypergraph with n nodes V, and m 

hyperarcs (ei,..., e m ) indexed by 1 < i < m. Each hyperarc has tail nodes T, C V = T ei , head 

hi £ V = h ei , and superior cost function Cj = c ei (/ is superior iff f(x 1 ,..., Xk) > at*, VI < i < k 

( |Knuth,, of variables Ti. The cost functions are implemented by constant time operations 

BINDCcj, y £ Ti, cost of y) and IN Fie,), which returns a lower bound on the cost given the variables bound so 

far. 

For a context-free grammar or regular tree grammar, introduce a fictitious sink nonterminal ui to the rhs of 
terminal rules. Now let the V be the nonterminals, and let X be u>. For each \ th rule, let // , be the lhs 
nonterminal, Ti be the set of rhs nonterminals (or ui if there are none). Finally, initialize INF(c,) to 
Wi = — log P(i\h ? ), the negative log rule probability of rule i, and define BINDic, , y £ Ti, c) as increasing 
INF(Cj) by #i(y)c, where #i(f) is the number of occurrences of nonterminal t in rule i. 

Output: 

For all v £ V, n[v] = i is the index of the cheapest hyperarc with head /(, = v, giving the predecessor relation 
of the cheapest unordered hyperpath-tree from the X t), and f3[v\ is minimum cost of reaching v. tt[o] = 0 if 
there is no cost-improving edge to v. Time complexity is Oin lg n + t) where (t is the total size of the input) if 
a Fibonacci heap is used, or 0(m lg n + t) if a binary heap is used. 

begin 

for y £ V do 

if y £ X then f}[y\ £- i v 
else f3[y\ <— oo 
■n[y\ £- 0 
|_ Adj[y\ £- {} 

Q <- HEAP-CREATE() 

for x £ X do HEAP-INSERT(Q, x, i x ) 

for 1 < i < m, index of a hyperarc {Ti = {a:i,..., Xk}) —> Ci {hi} do 

f if k 

/* r[i\ is the number of tail nodes remaining before edge i fires. */ 
for 1 < j < k do Adj[xj\ £- Adj[xj] U {)} 

while Q / 0 do 

y £- HEAP-EXTRACT-MIN (Q) 

for i £ Adj[y] do 

/* edge i with y as a tail */ 

if INF {a) < (3 [hi] then 
BIND(cj, y, P[y]) 
r[i] i- r[i] - 1 
if r[i] = 0 then 
c -t- INF (a) 
if c < P[hi\ then 

if P[hi\ = oo then HEAP-INSERT (Q, h h c) 
else HEAP-DECREASE-KEY {Q, h u c) 

7T [hi] £- i 
/3[hi] <- c 
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Algorithm 3: Single-destination hypergraph reachability 

Input: 

A destination node y e V in a hypergraph G = (V , /t’), with n nodes V, and m hyperarcs E = {e ±,..., e m } 
indexed by 1 < i < m. Each hyperarc has tail nodes T) CV = T ei and head h, G V = h ei . 

Output: 

For all x € V, A[x] = true if there is a hyperpath-tree X y such that x € X, false otherwise. Time 
complexity is 0(t) where t is the total size of the input (this is simple depth-first search on the projected regular 
graph). 

begin 

for x e V do A[x] <— false 
_ USE(y) 

USE(y) = begin 
A[y\ true 
for t G T) do 
if —u4[£] then 

L USE(f) 


sum over parts due to the tails and a part due to the arc. 

In Algorithmic] we implicitly perform this reversal and monadification of a hypergraph and obtain for each vertex 
v the cheapest way to complete the hyperpath-tree X v into X v ^ y (by that we mean adjoining some inside 
hyperpath-tree X v with , using parent ip[v] with total outside cost (leaving out the cost of A' v) a[u]. 

Then, the utility of v, or the cost of the cheapest hyperpath-tree using it, is just 7 [u] = a[v\ + j3[v\ and the utility 
of hyperarc e is 7 [e] = a[h e \ + l e + m)eT e TO /3 [t] . It is then easy to select vertices and edges for removal based 
on some criteria on their utility relative to the cost of the cheapest hyperpath-tree X y, which is /3[y\. 

Algorithm [5] selects the minimal subset of the hyperarcs and vertices necessary to include the best hyperpath-tree 
x ^ y with cost j3[y] and all hyperpath-trees with cost no worse than j3[y] + 6. 
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Algorithm 4: ViterbiOutside - single-destination, shortest outside hyperpath-trees 

Input: 

A destination y £ V and default (inside) costs /3[i>] for reaching each v £ V from X (computed with 
Viterbilnside), for a hypergraph with n nodes V, and m hyperarcs (ei, ..., e m ) indexed by 1 < i < m. 

Each hyperarc has length (i.e. cost to use) li = l ei , a multiset of tails Tj = T ei £ Ai(V), and head 

£V = h ei . The cost for hyperpath-tree from X h e using edge e and the best hyperpath-trees from X to 
each of its tails t with cost /3[t] is c e = l e + rn ^ eT . m/3[f] (where m is the number of occurrences of t in the 
tails), but other cost functions are possible - what is important is the ability to build up the cost for using an 
edge assuming the default for its tails, and later subtract out the contribution from the default of a single 
instance of a tails. 

Output: 

For all v £ V, f>[v] is the index of the hyperarc used to reach y from v (or 0 if none was taken) with the 
minimum outside cost a[u]=/3[y] — fi[v\ given by assuming the default cost way to was used to reach its siblings 
from X. Time complexity is 0(n lg n + t ) where (t is the total size of the input) if a Fibonacci heap is used, or 
0(m lg n + t) if a binary heap is used. 

begin 

for x £ V do 

ip[x] <— 0 
a[x\ ■£- oo 
_ Acir l [x\ £- {} 

for 1 < i < m, index of a hyperarc (Tj = {xi,... , Xk}) ~^ li {hi} do 
j_ for 1 < j < k do Adj~ '[hj] £- Ad/ _1 [/ij] U {xj} 

a[y\ £- 0 

Q £- HEAP-CREATEQ 
HEAP-INSERT(Q,r/,0) 
while Q f 0 do 

a: t- IIEAP-EXTRACT-MIN( Q ) 

for i £ Adj~ x [x\ do 

/* edge i with a: as a head */ 

c ■<—a[a:] + (,: + /* c=total cost of X'^ei'^ y */ 

for t £Ti do 

d ■£- c — f}[t\ /* c' is the proposed improved outside cost for t 

through e*, removing X ^ t */ 

if c' < a[t] then 

if a[hi] = oo then HEAP-INSERT(Q, t, d) 
else HEAP-DECREASE-KEY(Q, t, c’) 

ip[t] i— i 
a[t] <— d 





Algorithm 5: Prune relatively-useless vertices and hyperarcs 

Input: 

f}\v\ and a[i>], the Viterbi inside and outside costs of each vertex V over all hyperpath-trees from X y 
(computed with Viterbilnside and ViterbiOutside) in a hypergraph G = (V. E) with m hyperarcs 
E = {ei,..., e m } indexed by 1 < i < m. Each hyperarc has tail nodes X) C V = X ei and head hi GV = h ei . 
The cost for hyperpath-tree from X h e using edge e and the best hyperpath-trees from X to each of its tails t 
with cost /3[t] is c e = l e + ^2 tGT . mt(3[t\, where l e is the weight on hyperarc e and m t is a weight, e.g. the 
number of occurrences of t in the rhs of a grammar production. 

5 is a beam (cost distance from the best hyperpath-tree). 

Output: 

For all x £ V U E, 7 [ x\ is the cost of the best hyperpath-tree t £ (X y) such that x is used in t, or oo if 
none exists, « [./;] = true iff that cost is not more worse than 5 from the best f3[y\. 

Time complexity is 0(t) where t is the total size of the input, (total complexity including Viterbilnside is 

0(n lgn + 1)). 

begin 

l <- 0[y] + S 

for v G V' do 

j_ 7 [u] <- /3[v] + a[v] 

for e € E do 

]_ l[e] <- a[h e ] +le+ Y^tGTi m t/3[t] 
for x G V U E do k[x\ <— ( 7 [x] < l) 





